Combining reward shaping and hierarchies for scaling to large multiagent systems

نویسندگان

  • Chris HolmesParker
  • Adrian K. Agogino
  • Kagan Tumer
چکیده

Coordinating the actions of agents in multiagent systems presents a challenging problem, especially as the size of the system is increased and predicting the agent interactions becomes difficult. Many approaches to improving coordination within multiagent systems have been developed including organizational structures, shaped rewards, coordination graphs, heuristic methods, and learning automata. However, each of these approaches still have inherent limitations with respect to coordination and scalability. We explore the potential of synergistically combining existing coordination mechanisms such that they offset each others’ limitations. More specifically, we are interested in combining existing coordination mechanisms in order to achieve improved performance, increased scalability, and reduced coordination complexity in large multiagent systems. In this work, we discuss and demonstrate the individual limitations of two well-known coordination mechanisms. We then provide a methodology for combining the two coordination mechanisms to offset their limitations and improve performance over either method individually. Here, we combine shaped difference rewards and hierarchical organization in two domains with up to 10,000 sensing agents. We show that combining hierarchical organization with difference rewards can improve both coordination and scalability by decreasing information overhead, structuring agent-to-agent connectivity and control flow, and improving the individual decision making capabilities of agents. We show that by combining hierarchies and difference rewards, the information overheads and computational requirements of individual agents can be reduced by as much as 99% while simultaneously increasing the overall system performance within two variations of the Defect Combination Problem. Additionally, we demonstrate the robustness of this approach to handling up to 25% agent failures under various conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiagent Learning with a Noisy Global Reward Signal

Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy or difficult to analyze. This makes deriving a learnable local reward signal very difficult. Difference rewards (a particular instanc...

متن کامل

Effects of Shaping a Reward on Multiagent Reinforcement Learning

In reinforcement learning problems, agents take sequential actions with the goal of maximizing a time-delayed reward. In this chapter, the design of reward shaping for a continuing task in a multiagent domain is investigated. We use an interesting example, keepaway soccer (Kuhlmann, 2003; Stone, 2002; Stone, 2006), in which a team tries to maintain ball possession by avoiding the opponent’s int...

متن کامل

Potential-based reward shaping for knowledge-based, multi-agent reinforcement learning

Reinforcement learning is a robust artificial intelligence solution for agents required to act in an environment, making their own decisions on how to behave. Typically an agent is deployed alone with no prior knowledge, but if given sufficient time, a suitable state representation and an informative reward function is guaranteed to learn how to maximise its long term reward. Incorporating doma...

متن کامل

Potential-based difference rewards for multiagent reinforcement learning

Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent’s contribution to the system’s performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-speci...

متن کامل

Dynamic potential-based reward shaping

Potential-based reward shaping can significantly improve the time needed to learn an optimal policy and, in multiagent systems, the performance of the final joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learning together. However, a limitation of existing proofs is the assumption that the potential of a stat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Knowledge Eng. Review

دوره 31  شماره 

صفحات  -

تاریخ انتشار 2016